Partitional Clustering of Protein Sequences - An Inductive Logic Programming Approach

نویسندگان

  • Nuno A. Fonseca
  • Vítor Santos Costa
  • Rui Camacho
  • Cristina P. Vieira
  • Jorge Vieira
چکیده

We present a novel approach to cluster sets of protein sequences, based on Inductive Logic Programming (ILP). Preliminary results show that the method proposed produces understandable descriptions/explanations of the clusters. Furthermore, it can be used as a knowledge elicitation tool to explain clusters proposed by other clustering approaches, such as standard phylogenetic programs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relational Sequence Alignments and Logos

The need to measure sequence similarity arises in many applicitation domains and often coincides with sequence alignment: the more similar two sequences are, the better they can be aligned. Aligning sequences not only shows how similar sequences are, it also shows where there are differences and correspondences between the sequences. Traditionally, the alignment has been considered for sequence...

متن کامل

Top-Down Induction of Clustering Trees

An approach to clustering is presented that adapts the basic top-down induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for first order clustering. The TIC system employs the first order logical decision tree representation o...

متن کامل

"Say EM" for Selecting Probabilistic Models for Logical Sequences

Many real world sequences such as protein secondary structures or shell logs exhibit a rich internal structures. Traditional probabilistic models of sequences, however, consider sequences of flat symbols only. Logical hidden Markov models have been proposed as one solution. They deal with logical sequences, i.e., sequences over an alphabet of logical atoms. This comes at the expense of a more c...

متن کامل

Accurate Prediction of Protein Functional Class from Sequence in the M. tuberculosis and E. coli Genomes using Data Mining

(2) Author to whom correspondence should be sent. Abstract The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tu...

متن کامل

Learning functional logic classification concepts from databases

In this paper we address the possibilities, advantages and shortcomings of addressing different data-mining problems with the Inductive Functional Logic Programming (IFLP) paradigm. As a functional extension of the Inductive Logic Programming (ILP) approach, IFLP has all the advantages of the latter but the potential of a more natural representation language for classification, clustering and f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009